1. BUSINESS TASK

The purpose of this report is to address a case study for Cyclistic, a fictional bike-share company based in Chicago. The steps of the analysis process are as follows: Ask, Prepare, Process, Analyze, Share, and Act.

To date, Cyclistic’s marketing strategy has focused on building general awareness and appealing to broad consumer segments. One factor facilitating this was the flexibility of its pricing plans, including single-ride passes, full-day passes, and annual memberships.

Cyclistic’s financial analysts have determined that annual members are significantly more profitable than casual riders.

The marketing director, Lily Moreno, believes that the company’s future success hinges on maximizing the conversion of casual riders to annual members. She notes that casual riders are already familiar with the Cyclistic program and have chosen it for their mobility needs.

The goal of this case study is to investigate how annual members and casual riders use Cyclistic bikes differently. Based on these insights, we will design a new marketing strategy to convert casual riders into yearly members. Understanding these differences will be crucial for driving business decisions and maximizing profitability.

2. DESCRIPTION OF ALL DATA SOURCES USED

DATA ALLOCATION

The case study includes the following data sets from the Google platform for academic purposes.

Divvy_Trips_2019_Q1

Divvy_Trips_2020_Q1

DATA STRUCTURE

# Inspect the dataframes and look for incongruities
glimpse(q1_2019)
## Rows: 365,069
## Columns: 12
## $ trip_id           <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 21…
## $ start_time        <chr> "2019-01-01 0:04:37", "2019-01-01 0:08:13", "2019-01…
## $ end_time          <chr> "2019-01-01 0:11:07", "2019-01-01 0:15:34", "2019-01…
## $ bikeid            <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205,…
## $ tripduration      <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, …
## $ from_station_id   <dbl> 199, 44, 15, 123, 173, 98, 98, 211, 150, 268, 299, 2…
## $ from_station_name <chr> "Wabash Ave & Grand Ave", "State St & Randolph St", …
## $ to_station_id     <dbl> 84, 624, 644, 176, 35, 49, 49, 142, 148, 141, 295, 4…
## $ to_station_name   <chr> "Milwaukee Ave & Grand Ave", "Dearborn St & Van Bure…
## $ usertype          <chr> "Subscriber", "Subscriber", "Subscriber", "Subscribe…
## $ gender            <chr> "Male", "Female", "Female", "Male", "Male", "Female"…
## $ birthyear         <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995…
glimpse(q1_2020)
## Rows: 426,887
## Columns: 13
## $ ride_id            <chr> "EACB19130B0CDA4A", "8FED874C809DC021", "789F3C21E4…
## $ rideable_type      <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
## $ started_at         <chr> "2020-01-21 20:06:59", "2020-01-30 14:22:39", "2020…
## $ ended_at           <chr> "2020-01-21 20:14:30", "2020-01-30 14:26:22", "2020…
## $ start_station_name <chr> "Western Ave & Leland Ave", "Clark St & Montrose Av…
## $ start_station_id   <dbl> 239, 234, 296, 51, 66, 212, 96, 96, 212, 38, 117, 1…
## $ end_station_name   <chr> "Clark St & Leland Ave", "Southport Ave & Irving Pa…
## $ end_station_id     <dbl> 326, 318, 117, 24, 212, 96, 212, 212, 96, 100, 632,…
## $ start_lat          <dbl> 41.9665, 41.9616, 41.9401, 41.8846, 41.8856, 41.889…
## $ start_lng          <dbl> -87.6884, -87.6660, -87.6455, -87.6319, -87.6418, -…
## $ end_lat            <dbl> 41.9671, 41.9542, 41.9402, 41.8918, 41.8899, 41.884…
## $ end_lng            <dbl> -87.6674, -87.6644, -87.6530, -87.6206, -87.6343, -…
## $ member_casual      <chr> "member", "member", "member", "member", "member", "…

ROCCC DATA(Reliable, Original, Comprehensive, Current, Cited)

Reliable

Yes: The City of Chicago and Divvy, the official bike-sharing operator, provides the data. This lends credibility to the data reliability.

Original

Yes: The datasets are the original records of Divvy bike trips, directly generated by the system’s tracking mechanisms.

Comprehensive

Mostly Yes: The datasets contain a significant amount of information about each trip, including time stamps, station locations, and user types.

Current

Partially: While historical datasets like “2019_Q1” and “2020_Q1” are valuable, they are not continuously updated in the duplicate files.

Explanation: The City of Chicago and Divvy provide ongoing data releases, but you must find the most recent datasets to get the most current information. Therefore, the datasets you mentioned are historical.

Cited

Yes: The data is generally available from reputable sources: The City of Chicago Data Portal is a recognized source for public data. Divvy’s official website.

LICENSING, PRIVACY, SECURITY, and ACCESSIBILITY

Motivate International Inc. has made the data available under this license.

Those datasets are public data that explore how different customer types use Cyclistic bikes.

Data-privacy issues prohibit the use of riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

DATA INTEGRITY

  • In both datasets, time stamps are formatted as “YYYY-MM-DD hh:mm: ss,” ensuring consistent time tracking.

  • The Divvy_Trips_2019_Q1 data set includes trip duration in seconds within the tripduration field, which is absent in Divvy_Trips_2020_Q1.

  • User classifications changed from customer and subscriber in 2019 to casual and member in 2020, reflecting a shift in our understanding of users.

  • The ride_id and rideable_type fields also show differing data types between the datasets.

  • By selecting essential fields and standardizing column names, we can streamline our analysis and make the most of this valuable data!

3. CLEANING AND DATA MANIPULATION

We chose RStudio because it is a powerful tool that integrates visualization, reporting, and analyzing large data sets. Programming saves time and effort when interacting with data.

A copy of the CVS files was saved on the Desktop in the directory ..DESKTOP/BIKE _SHARE/ORIGINAL DATA.

Data transformation will be performed in the following chunks to stack correctly when combined as a single file.

(q1_2019 <- rename(q1_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid
                   ,started_at = start_time
                   ,ended_at = end_time
                   ,start_station_name = from_station_name
                   ,start_station_id = from_station_id
                   ,end_station_name = to_station_name
                   ,end_station_id = to_station_id
                   ,member_casual = usertype
))
## # A tibble: 365,069 × 12
##     ride_id started_at      ended_at rideable_type tripduration start_station_id
##       <dbl> <chr>           <chr>            <dbl>        <dbl>            <dbl>
##  1 21742443 2019-01-01 0:0… 2019-01…          2167          390              199
##  2 21742444 2019-01-01 0:0… 2019-01…          4386          441               44
##  3 21742445 2019-01-01 0:1… 2019-01…          1524          829               15
##  4 21742446 2019-01-01 0:1… 2019-01…           252         1783              123
##  5 21742447 2019-01-01 0:1… 2019-01…          1170          364              173
##  6 21742448 2019-01-01 0:1… 2019-01…          2437          216               98
##  7 21742449 2019-01-01 0:1… 2019-01…          2708          177               98
##  8 21742450 2019-01-01 0:1… 2019-01…          2796          100              211
##  9 21742451 2019-01-01 0:1… 2019-01…          6205         1727              150
## 10 21742452 2019-01-01 0:1… 2019-01…          3939          336              268
## # ℹ 365,059 more rows
## # ℹ 6 more variables: start_station_name <chr>, end_station_id <dbl>,
## #   end_station_name <chr>, member_casual <chr>, gender <chr>, birthyear <dbl>
q1_2019 <- mutate(q1_2019, ride_id = as.character(ride_id)
                  ,rideable_type = as.character(rideable_type))
q1_2019 <- q1_2019 %>%
  mutate(member_casual = recode(member_casual
                                ,"Subscriber" = "member"
                                ,"Customer" = "casual"))
#check the unique values of the column(validating the field)
unique(q1_2019$member_casual)
## [1] "member" "casual"
# Remove lat, long, birthyear, and gender fields, as this data was dropped beginning in 2020
q1_2019 <- q1_2019 %>%
  select(-c( birthyear, gender, "tripduration"))
q1_2020 <- q1_2020 %>%
  select(-c(start_lat, start_lng, end_lat, end_lng))
conteo_na19 <- q1_2019 %>%
  summarise(across(everything(), ~ sum(is.na(.))))

conteo_na20 <- q1_2020 %>%
  summarise(across(everything(), ~ sum(is.na(.))))

options(width = 10000)# to see all fields on screen
print(conteo_na19)
## # A tibble: 1 × 9
##   ride_id started_at ended_at rideable_type start_station_id start_station_name end_station_id end_station_name member_casual
##     <int>      <int>    <int>         <int>            <int>              <int>          <int>            <int>         <int>
## 1       0          0        0             0                0                  0              0                0             0
print(conteo_na20)
## # A tibble: 1 × 9
##   ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id member_casual
##     <int>         <int>      <int>    <int>              <int>            <int>            <int>          <int>         <int>
## 1       0             0          0        0                  0                0                1              1             0
options(width = 80) #Reset to default

We found two missing values for station names. Let’s erase these rows in 2020.

q1_2020 <- na.omit(q1_2020)

paste("Any NA value:",any(is.na(q1_2020)))
## [1] "Any NA value: FALSE"

4. SUMMARY OF THE ANALYSIS

The comparative analysis of bike usage between members and casual riders focused on the duration and number of trips (analyzed across different periods), peak usage times, and main routes to uncover usage patterns, relationships, and trends. These insights will enable Cyclistic to understand the differences in behavior between customer groups and tailor incentives and communication better to align their needs with the benefits of membership, ultimately aiming to drive conversions.

DATA ORGANIZATION

This section documents every cleaning task after the calculations and new fields are incorporated into the analysis process.

  • The following code merges the original data sets Divvy_trips_Q1 and Divvy_Trips_Q2 with the relevant information for the study, getting the data set all_trips.
# Stack individual quarters' data frames into one big data frame
all_trips <- bind_rows(q1_2019, q1_2020)
  • Since the field ride_id was converted to character for 2019, and now it is the key field, it is reasonable to inspect for duplicates.
paste("Any duplicated ride_id:",any(duplicated(all_trips$ride_id)))
## [1] "Any duplicated ride_id: FALSE"
  • Adding a calculated field ride_length since the data set for 2020 year doesn’t have the “tripduration” field, we reject for consistency.
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
CALCULATED FIELD ride_length CLEANING

The following code converts “ride_length” from Factor to numeric so we can run calculations on the data (in seconds) and answer some criteria, looking for bad data.

all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))

paste("Numeric ride_length?", is.numeric(all_trips$ride_length))
## [1] "Numeric ride_length? TRUE"
paste(sum(is.na(all_trips$ride_length)), "NA values") # See how many NAs were created
## [1] "0 NA values"
paste( sum(all_trips$member_casual == "casual" & all_trips$ride_length > 14400), "huge casual's rides")
## [1] "1044 huge casual's rides"
paste(sum(all_trips$member_casual == "member" & all_trips$ride_length > 3600), " huge member's rides")
## [1] "2366  huge member's rides"
paste(sum(all_trips$ride_length < 0), " negatives ride durations")
## [1] "116  negatives ride durations"
paste(sum(all_trips$start_station_name == "HQ QR"), "taken out of docks and checked for quality by Divvy.")     
## [1] "3766 taken out of docks and checked for quality by Divvy."

A “HQ QR” station wouldn’t be a typical public-facing bike docking station where regular users begin or end their trips. Instead, it would be a designated point associated with Divvy’s operational activities, like bike maintenance or redistribution. That is why we wouldn’t consider this station.

The data frame includes 116 negative ride_length entries, a tiny proportion that we will remove because they are clear data errors.

We get back to this cleaning point, trying to threshold the scope of our analysis because we found huge rides that skew the statistics and are not representative based on Divvy Bikes’ pricing terms:

Day Pass Terms: Divvy’s Day Pass typically includes unlimited rides of up to 3 hours each within 24 hours. Rides exceeding this duration incur additional per-minute charges.

Annual Membership Terms: Divvy annual memberships usually include the first 45 minutes of each ride for free. Rides longer than 45 minutes incur additional per-minute charges.

To address these issues, we set the upper limit of ride length for casuals to 14400 and members to 3600. According to these criteria, we will remove 1,5 % of the total casual observations and 0.3 % of the members’ registers.

The setting of upper limits is reasonable because it improves the representativeness of the analysis by focusing on typical usage patterns for casual and member riders while removing a tiny percentage of potentially anomalous or outlier data.

The chunk below removes data based on the criteria above, creating a new version (v2) of a data frame.

# https://www.datasciencemadesimple.com/delete-or-drop-rows-in-r-with-conditions-2/
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0 | all_trips$ride_length>14400 | (all_trips$member_casual == "member")& (all_trips$ride_length > 3600)),]
ADDING FIELDS OF INTEREST

In the following chunk, we add the fields with different periods to the data set all_trips_v2 and save it in the directory ..DESKTOP/BIKE _SHARE/ORIGINAL DATA, in case it would be necessary for further analysis.

all_trips_v2$date <- as.Date(all_trips_v2$started_at) #The default format is yyyy-mm-dd
all_trips_v2$month <- format(all_trips_v2$date, "%m")
all_trips_v2$year <- format(all_trips_v2$date, "%Y")
all_trips_v2$day_of_week <- format(all_trips_v2$date, "%A")
write_csv(all_trips_v2,"all_trips_v2.csv")

DESCRIPTIVE ANALYSIS

The initial phase of the analysis involved a comprehensive descriptive exploration of the all_trips_v2 data set to establish a foundational understanding of bike usage patterns. This included examining key variables such as trip duration (ride_length), trip frequency, and temporal aspects like monthly, weekly, and daily usage trends. We also investigated the distribution of trip start times to identify peak usage periods and analyzed the prevalence of different routes. Statistical summaries, including central tendency and dispersion measures, were calculated for ride_length to understand its overall distribution and compare it between member and casual riders. This descriptive overview provides the context for the subsequent visual exploration and identification of key findings regarding user behavior and usage patterns.

5. SUPPORTING VISUALIZATIONS AND KEY FINDINGS

TOTAL BIKE USAGE COMPARISON

total_usage <- all_trips_v2 %>% group_by(member_casual) %>% summarize(Total_Ride_Length = sum(ride_length), Number_of_Rides = n() )

print(total_usage)
## # A tibble: 2 × 3
##   member_casual Total_Ride_Length Number_of_Rides
##   <chr>                     <dbl>           <int>
## 1 casual                128166358           66833
## 2 member                464221012          717946
  • Members exhibit significantly higher ride length and number of trips than casual riders.

STATISTICAL COMPARISON OF BIKE USAGE

The statistical comparison is based on basic statistics (Minimum, Maximum, Quartiles, Median, and Mean) of ride duration. It helps uncover the distribution of ride duration by user type.

# Apply the summary function to the ride_length for each group
summary_by_user <- all_trips_v2 %>%
  group_by(member_casual)%>%
  summarise(summary_ride_length = list(summary(ride_length))) %>%
  unnest_wider(summary_ride_length)

options(digits = 10)
print(summary_by_user)
## # A tibble: 2 × 7
##   member_casual Min.        `1st Qu.`   Median      Mean         `3rd Qu.` Max. 
##   <chr>         <table[1d]> <table[1d]> <table[1d]> <table[1d]>  <table[1> <tab>
## 1 casual        2           772         1373        1917.7106818 2276      14385
## 2 member        1           317          506         646.5960003  819       3600

KEY TAKEAWAYS

Ride Duration Difference: Even after setting upper limits, casual riders consistently exhibit longer ride duration than members across all statistical measures (median, mean, quartiles).

Typical Usage Patterns: Members predominantly use the bikes for shorter trips (median around 8.4 minutes), likely for commuting or quick errands. Casual riders have a much longer typical ride duration (median around 22.9 minutes), suggesting more leisure-oriented or longer single-use trips.

Distribution Shape: The ride duration distribution for casual riders remains more right-skewed than for members, indicating a greater tendency for longer rides within their allowed window.

Impact of Upper Limits: The upper limits are now reflected in the maximum values, effectively removing the extreme outliers that were present before. This provides a more focused view on most ride durations within a reasonable time frame for each user type.

BIKE USAGE COMPARISON OVER TIME

In the following chunks, we will analyze the bike usage by month and daily, using the median function because it is less affected by the number of rides as a measure of central tendency.

MONTHLY BIKE USAGE

In this section, we show two bar graphs to compare the monthly bike usage of members and casual riders—one with the ride_length behavior, and the other with the number of rides.

#Let's analyze monthly bike usage per user

monthly_usage <-all_trips_v2 %>% group_by(month,member_casual) %>% 
  summarise( ride_length_median = median(ride_length), .groups = "drop")

plot_length <- ggplot(monthly_usage, aes(x = month, y =ride_length_median, fill = member_casual )) + geom_col(position = "dodge") 

plot_ride_count <- ggplot(all_trips_v2, aes(x = month, fill = member_casual )) + geom_bar(position = "dodge")+ scale_y_continuous(labels = label_number(accuracy = 1))

combined_plot <- plot_length + plot_ride_count +
  plot_layout(ncol = 2) + # Especifies that plots are placed in two columns
  plot_annotation(title = "Monthly Comparison of Ride Length and Number of Trips by User")

print(combined_plot, width = 12, height = 6)

KEY TAKEAWAYS

Different Usage Patterns: Members use the bike-sharing service much more frequently, suggesting a potential use for commuting or short, regular trips. Casual riders use the service less often but for significantly longer durations, indicating more leisure-oriented or longer, single-use trips.

Temporal Trends: Both the median ride length and the number of rides show variation across the months, suggesting seasonal or monthly patterns in usage. The increase in casual ridership towards March could be due to improving weather.

Dominance of Members: Members constitute most of the rides taken during these first three months.

DAY OF WEEK BIKE USAGE
# See the median ride time by each day for members vs casual users
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")) 
all_trips_v2 %>% 
  group_by(day_of_week,member_casual) %>%
  summarise(median_ride_length = median(ride_length), .groups = "drop") %>% ggplot(aes(x=day_of_week, y = median_ride_length, fill = member_casual)) +
  geom_col(position = "dodge")+
  labs(title = "Median Ride Length by Day of Week and User Type")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))

KEY TAKEAWAYS

The median reflects a clear pattern in bike usage across the week.

  • Casual riders demonstrate significantly higher bike usage on every day of the week.

  • Notably, weekends are the peak days for casual riders, highlighting their preference for biking during this time.

  • In comparison, members showcase a more consistent and stable pattern of bike usage, emphasizing the reliability of their riding habits.

DAILY MEAN OF RIDE LENGTH
data <- all_trips_v2 %>%
  group_by(member_casual, date, year) %>%
  summarise(daily_avg = mean(ride_length), .groups = "drop") %>%
  mutate(year = as.factor(year)) # Convert year to factor
contrasting_colors <- c("member" = "#1f78b4", "casual" = "#e31a1c") 

    plots <- lapply(unique(data$year), function(y) {
  p <- plot_ly(data %>% filter(year == y),
               x = ~date,
               y = ~daily_avg,
               color = ~member_casual,
               colors = contrasting_colors,
               text = ~paste("Date: ", date, "<br>Mean: ", round(daily_avg, 2)),
               hoverinfo = "text") %>%
    add_trace(type = "scatter", mode = "lines", name = ~member_casual, showlegend = (y == unique(data$year)[1])) %>%
    add_trace(type = "scatter", mode = "markers", name = ~member_casual, showlegend = FALSE) %>%
    layout(
      annotations = list(
        x = 0.75,
        y = 0.95, # Adjusted y value (lower than 1.05)
        text = paste("Year:", y),
        xref = "paper",
        yref = "paper",
        showarrow = FALSE
      ),
      xaxis = list(title = list(text = ""), tickformat = "%b"),
      yaxis = list(title = list(text = ""))
    )
  return(p)
})

subplot(plots, nrows = 1, shareX = TRUE, titleX = TRUE) %>%
  layout(title = list(text = "Average Ride Length by Date and User Type"),
         width = 800,  # Ajusta este valor
         height = 400) # Ajusta este valor

KEY TAKEAWAYS

  • Again, members’ average daily bike usage patterns are more consistent and stable than casual riders. The average ride length for casual riders appears more volatile and often reaches higher peaks than for members.

  • The daily usage average is significantly higher for casual riders than members.

  • Casual riders show seasonal variation in ride lengths, with greater fluctuations in the spring (March-April) compared to winter (January-February). Notably, average ride lengths increased in the spring of 2020 compared to 2019, potentially linked to broader societal changes.

  • The daily fluctuations suggest that factors such as weather, day of the week, and events likely influence average ride duration.”

NUMBER OF TRIPS (ANNUAL COMPARISON)

This section will investigate the annual behavior of the number of trips.

trips_by_user <-all_trips_v2 %>%
  group_by(member_casual, year) %>% 
  summarise(number_of_rides = n(), .groups = "drop")

crecimiento_19 <- (trips_by_user$number_of_rides[2] -trips_by_user$number_of_rides[1])/trips_by_user$number_of_rides[1]

crecimiento_20  <- (trips_by_user$number_of_rides[4] -trips_by_user$number_of_rides[3])/trips_by_user$number_of_rides[3]

  ggplot(trips_by_user,aes(x = member_casual, y = number_of_rides, fill = as.factor(year))) +
  geom_col(position = "dodge") +
  geom_text(aes(label = format(number_of_rides, big.mark = ",", trim = TRUE, scientific = FALSE),#eliminate scientific notation on y-axis
                group = as.factor(year)),
            position = position_dodge(width = 0.9),
            vjust = -0.3,
            size = 4) +
  labs(title = "Number of Rides ( Annual Comparison )",
       y = NULL, # Eliminate the y-axs labels 
       fill = "Year") +
  theme(axis.title.y = element_blank(), #To Eliminate y-axis title
        axis.text.y = element_blank(),  # To Eliminate y-axis labels
        axis.ticks.y = element_blank())+# To Eliminate y-axis ticks
  annotate("text", x = "casual", y = 80000,
           label = paste0("\u2191 (", round(crecimiento_19 * 100, 2), "%)"),
           size = 4, color = "DarkGreen")+
  annotate("text", x = 1 + 0.5, y = 370000,
           label = paste0("\u2191 (", round(crecimiento_20 * 100, 2), "%)"),
           size = 4, color = "DarkGreen")

KEY TAKEAWAYS

  • Exponential Growth of Casual Users: A 93% increase in trips by casual users indicates a near doubling in the number of trips this group takes from one year to the next. This significant growth suggests a much greater adoption or use of the service by non-subscribed users.

  • Modest but Sustained Growth of Members: A 10.6% increase in member trips is also positive, indicating that the subscribed user base is growing or using the service more frequently. Although the percentage is lower than that of casual users, it still represents a significant increase given the “quite high” user base.

  • Different Growth Dynamics: The significant disparity in growth percentages suggests that the factors driving service usage might affect casual users and members differently. There could be marketing campaigns or external factors that are especially attracting non-subscribed users, or perhaps members’ usage patterns are more stable.

BIKE TRIPS DISTRIBUTION PER HOUR (PEAK TIMES USAGE)

In the following chunk, we transform the character string started_at to a standard date format time stamp so that we can make operations easier. In this case, extracting the hour when the ride begins as a new field.

# Convert to POSIXct to keep both date and time
all_trips_v2$start_at_datetime <- as.POSIXct(all_trips_v2$started_at, format = "%Y-%m-%d %H:%M:%S")
# Extract the hour from 'start_at_date_time'
all_trips_v2 <- all_trips_v2 %>%
  mutate(hour_started = hour(start_at_datetime))

#Convert 'hour_started' to a factor for proper binning in ggplot
all_trips_v2$hour_started <- factor(all_trips_v2$hour_started, levels = 0:23)
# 3. Create a bar plot to visualize the distribution by hour
ggplot(all_trips_v2, aes(x = hour_started, fill = member_casual)) +
  geom_bar(position = "stack") +
  labs(title = "Bike Trip Distribution by Hour and User Type",
       x = "Hour of Day",
       y = "Trip Frequency",
       fill = "User Type") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels

KEY TAKEAWAYS

  • Members’ maximum bike usage is during peak hours (7:00 am - 9:00 am and 4:00 pm - 6:00 pm).

  • Casual riders’ maximum bike usage is during the afternoon, essentially from 12:00 pm to 6:00 pm.

  • Note that the number of rides is significantly higher for members throughout the day.

KEY FINDINGS

RIDERSHIP PATTERNS
  • Members demonstrate more consistent daily ride lengths, while casual riders exhibit greater volatility, with daily fluctuations suggesting that factors like day of the week and weather more influence their usage.

  • Members’ total bike usage is significantly higher than casual riders.

  • The following patterns suggest that casual members use bikes for leisurely or touristic purposes, while members rely on bikes for commuting or transportation.

    • There are consistent (any period of time is the same) and different usage patterns: Members frequently have short trips (typically 8.4 min), and casual riders make long trips ( typically 22.9 min). Still, the number of rides is significantly lower than that of members.
    • Casual riders use bikes on weekends and afternoons, while members use bikes essentially on weekdays and during peak hours.
  • Casual riders show a general seasonal pattern and a specific increase in spring 2020.

  • Members use the most popular routes, which are the most efficient and well-traveled.

  • The most used stations by casual riders are:

    Lake Shore Dr. & Monroe St as the starting point

    Streeter Dr. & Grand Ave as the ending point

RELATIONSHIPS
  • The number of rides and trip duration directly affect basic statistics, which show more bike usage for casual riders because of longer trips and fewer rides than members. In contrast, the total results show more bike usage by members.

6. RECOMENDATIONS

  1. Offer a “Leisure to Loyalty” Membership Trial: Capitalize on the casual riders’ tendency to use bikes for longer trips on weekends and afternoons. Offer a limited-time membership trial targeting casual users during these peak leisure times. This trial could provide benefits like discounted rates for longer durations, free weekend rentals after a certain number of casual rides, or access to member-only routes or curated leisure ride suggestions. Promote this trial through the app when casual riders start longer trips or on weekend afternoons.

  2. Implement a Tiered Membership System with Leisure-Focused Benefits: Introduce a membership tier that caters specifically to the leisure use patterns of casual riders. This could be a more affordable option with benefits like extended rental times on weekends, discounts at partner leisure destinations (cafes near popular routes, parks), or the ability to reserve bikes in advance for weekend outings. Clearly communicate the value proposition of this tier compared to the standard membership, highlighting the cost savings for their typical usage.

  3. Personalized Conversion Campaigns Based on Usage Patterns: Leverage the data on casual riders’ preferred starting and ending points (e.g., Lake Shore Dr. & Monroe St., Streeter Dr. & Grand Ave) and their tendency to ride more in warmer months. Implement personalized in-app messages or email campaigns during these times, highlighting membership benefits for their specific leisure routes and frequency. For example, suggest cost savings if they were members based on their past usage, or offer discounts on annual memberships as the weather gets warmer and their riding increases.